1.7. Hyperparameters
1.8. Imballanced Data
1.9. Multi-Class
The Iris dataset is useful for demonstrating SVM's, but are a bit "too small to be representative of real world machine learning tasks"1.
Furthermore, when comparing to other methods, we will probably do well no matter how our hyperparameters are set. So for this lecture/workbook I'll demonstrate its use on larger (yet still manageable) datasets.
The Abalone Dataset2
| Name | Data Type | Meas. | Description | |
|---|---|---|---|---|
| Sex | nominal | M, F, and I (infant) | ||
| Length | continuous | mm | Longest shell measurement | |
| Diameter | continuous | mm | perpendicular to length | |
| Height | continuous | mm | with meat in shell | |
| Whole weight | continuous | grams | whole abalone | |
| Shucked weight | continuous | grams | weight of meat | |
| Viscera weight | continuous | grams | gut weight (after bleeding) | |
| Shell weight | continuous | grams | after being dried | |
| Rings | integer | +1.5 gives the age in years |
| Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
| 1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
| 2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
| 3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
| 4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |